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ABSTRACT 

REPs are highly repeated intergenic palindromic 
sequences often clustered into structures called 
BIMEs including two individual REPs separated by 
short linker of variable length. They play a variety of 
key roles in the cell. REPs also resemble the sub- 
terminal hairpins of the atypical \S200/605 family of 
insertion sequences which encode Y1 transposases 
(TnpA| S 2oo/is605)- These belong to the HUH endo- 
nuclease family, carry a single catalytic tyrosine (Y) 
and promote single strand transposition. Recently, a 
new clade of Y1 transposases (TnpA REP ) was found 
associated with REP/BIME in structures called 
REPtrons. It has been suggested that TnpA REP is 
responsible for REP/BIME proliferation over 
genomes. We analysed and compared REP 
distribution and REPtron structure in numerous 
available E. co/i and Shigella strains. Phylogenetic 
analysis clearly indicated that fnpA REP was 
acquired early in the species radiation and was lost 
later in some strains. To understand REP/BIME 
behaviour within the host genome, we also studied 
E. co/i K12 TnpA REP activity in vitro and demonstrated 
that it catalyses cleavage and recombination of 
BIMEs. While TnpA REP shared the same general 
organization and similar catalytic characteristics 
with TnpA|S2oo/is605 transposases, it exhibited 
distinct properties potentially important in the 
creation of BIME variability and in their amplification. 
TnpA REP may therefore be one of the first examples of 
transposase domestication in prokaryotes. 

INTRODUCTION 

Repeated extragenic palindrome (REP) or Palindromic 
unit (PU) sequences were identified nearly 30 years ago 



in the intergenic regions of enterobacterial genomes (1). 
They play a variety of key roles in the cell. They are 
involved in regulating gene expression (by functioning as 
transcription terminators, by stabilizing mRNA and by 
acting as topological insulators for transcription-induced 
positive supercoiling (2-5), and in structuring DNA 
(by binding proteins such as IHF, Poll and DNA 
gyrase) (6-9). They are also specific target sites for 
several bacterial insertion sequences (10-12). 

REPs are between 20- and 40-nt long, often clustered in 
structures called bacterial interspersed mosaic element 
(BIMES) as two tandem inverted copies separated by 
linkers, and have now been identified in a large number 
of bacterial genera and species where they are often found 
in high copy number (12-18). There are about 600 copies 
in Escherichia coli representing ~1% of the genome 
(15,19) and over 1600 copies in Stenotrophomonas 
maltophilia (17). The ubiquitous nature of REPs and 
their multiplicity raises the important question of how 
they have expanded to populate their host genomes 
and have evolved their present multiple roles. 

A clue to this may lie in members of a class of atypical 
bacterial insertion sequences (IS), the IS200/IS605 family, 
whose ends strongly resemble REPs. These ISs carry 
REP-like subterminal hairpins or imperfect palindromes 
(IP) secondary structures which are recognized and bound 
by the IS-specific transposase. They use a transposase, 
TnpA, of the HUH endonuclease family with a single 
catalytic tyrosine (Yl) as an attacking nucleophile and 
transpose using obligatory single-stranded (ss) DNA inter- 
mediates (20-23). We (ISfinder: www-is. biotoul.fr) and 
others (24) have identified a group of proteins, 
TnpA REP closely related to IS200/IS605 transposases 
associated with REP sequences but forming a distinct 
clade defining a separate Yl family. TnpA REP occurs in 
a variety of bacterial species and genera and is always 
flanked by REP/BIME sequences. Its presence appeared 
to be correlated with an increased abundance of REPs in 
the corresponding genomes suggesting that TnpA RE p may 
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be responsible for REP proliferation throughout their host 
genomes (24). The molecular mechanism generating these 
patterns is unknown. 

In vitro and in vivo studies of two IS200/IS605 family 
members, IS608 and !SDra2, have provided a detailed 
picture of their transposition. This family differs pro- 
foundly from 'classical' ISs: they do not include terminal 
inverted repeats (IRs) and do not generate direct flanking 
target repeats (DRs) on insertion. Cleavage occurs at some 
distance from the IPs (25,26) via a transient covalent 
S'-phosphotyrosine linked intermediate with the substrate 
DNA, leaving a free 3 r -OH group on the other side of 
the DNA break. DNA cleavage also requires a divalent 
metal ion coordinated by two histidine residues, 
constituting the HUH motif, together with a third 
residue located close to the catalytic tyrosine (23,27). 
Transposition is strand-specific: TnpA Is ^/is/) ra 2 recog- 
nises only the 'top' strand which undergoes strand 
cleavage and transfer to the target site. The 'bottom' 
strand is inactive. The cleavage site at both left and right 
ends is not recognized directly by TnpA but forms a set of 
hydrogen bonds with a short guide sequence located 5' to 
the foot of the left and right subterminal IP (23,28,29). 
This recognition is essential for cleavage. Finally, 
excision and insertion occur preferentially at the lagging 
strand template in replication forks (30). 

To address how BIMEs might invade and amplify 
within a genome, we first analysed BIME distribution 
and polymorphisms in the genomes of 44 assembled 
E. coli and Shigella strains. We also identified a single 
locus in a majority of the 110 available E. coli and 
Shigella genome sequences where a single tnpA KEP gene 
is located. Phylogenetic analysis suggested that tnpA KEP 
was acquired early in the radiation of the species into 
present-day strains. The gene is bordered by variable 
numbers of BIMEs in structures, similar to that of 
IS200/IS605 family members, called REPtrons, 
However, REPtrons do not appear to transpose as a unit 
but the BIMEs themselves are likely to be mobile and may 
have spread in a two-step process: transposition/recombin- 
ation, which generates the observed sequence diversity of 
BIMEs, followed by local amplification. 

To determine whether TnpA REP might be involved in 
this process, we analysed its cleavage and recombination 
activity in vitro. While TnpA REP shares similar catalytic 
characteristics with TmpAi S 2oo/iS605 transposases, it 
exhibited distinct properties potentially important in the 
creation of BIME variability and in their amplification. In 
the light of these observations, we discuss the possible role 
of TnpA REP in generating variability and in proliferation 
of BIMEs throughout their host genomes. 

To our knowledge, REP/BIME and TnpA REP probably 
represent the first example of bacterial transposable 
element domestication. 



MATERIALS AND METHODS 

Bioinformatic procedures 

Transposase identification and analyses. The primary 
transposase sequence of representative elements was 



used as a query in a BLASTP (31) search among all 
complete and partial prokaryotic genome sequences 
available on the NCBI server. All apparently full-length 
transposases were retained. Recursive BLASTP searches 
were performed using the less conserved retrieved 
sequences, i.e. those with the lowest BLAST score. The 
procedure was terminated when the results converged to 
a final stable data set (no new transposase sequences were 
detected). BLASTP searches were performed on the NCBI 
BLAST online interface without the low complexity 
filter but with otherwise default parameters. Multiple 
alignments were carried out using either ClustalW (32) 
or MultAlin (33) and some displays were obtained using 
the Jalview alignment editor (34). 

In a second step, we used the Markov Cluster 
Algorithm (MCL) (http://micans.org/mcl/) (35,36) to 
weigh relationships between protein clusters. An inflation 
factor (IF) of 1.2 was used and edges having BLASTP 
score values of <30 were filtered (score > 30). 

REP identification. The GenBank files of the complete 
bacterial genomes used in this study were retrieved from 
the NCBI public repository (http://www.ncbi.nlm.nih. 
gov/genomes/ lproks.cgi). 

REP identification was achieved with a combination of 
two methods: a method based on local alignments and 
a method based on sequence profiles. 

For the first approach, consensus sequences were 
derived from Bachellier et al. (19) transformed to follow 
our sequence convention. They cover the three REP 
families y, zl and z2 and each was used as a query for 
BLASTN (31) similarity searches on DNA sequences of 
the complete genomes. Since we are dealing with short and 
variable sequences, we set BLASTN parameters to permis- 
sive values to increase sensitivity (expectation value < 10, 
word size = 4, reward for a nucleotide match = 1, penalty 
for a nucleotide mismatch = — 1, cost to open a gap = —1). 
BLASTN is fast and selective but, since it produces a local 
alignment, the boundaries of predicted REPs can be 
shorter than expected. In addition, the observed nucleotide 
variation at each position is not included in the alignment 
scoring since it can decrease the sensitivity of the predic- 
tion. Thus, a second approach to predict motifs containing 
gaps was used based on the GLAM2 programme (37). 
Profiles for each REP family were built by applying the 
GLAM2 program on unambiguous full length REPs pre- 
dicted by the previous BLASTN searches. The 
GLAM2SCAN program was further used to find occur- 
rences of the GLAM2 motifs in target sequences. 

To set the parameters for both approaches, we used the 
annotation of BIMEs in E. coli K12 (NC_000913.2) (avail- 
able at: http:// www.pasteur.fr/recherche/unites/ pmtg/ 
repet/index.html) as a training set. The estimated 
number of identified REPs corresponds to the combined 
results of both approaches. 

Plasmid construction and TnpA REP purification 

Escherichia coli MG1655 tnpA KBF was cloned with a 6-His 
extension under control of promoter p ara in pBS176. 
Expression and purification were carried out in E. coli 
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K12 (Rosetta, DE3) (Novagen) on Ni-agarose (Qiagen) as 
described for TnpA Is ^ (25). Plasmid pBS180 and deriva- 
tives were constructed in several steps: the MG1655 
REPtron region was first isolated by PCR directly from 
MG1655 genomic DNA and cloned into pBluescript, 
pSK. The tnpA KEP gene was replaced by a Cm® cassette 
and the downstream BIMEs were then removed by iPCR. 
In pBS180mut, mutations of the conserved GTAG 
were introduced using the Quickchange Site-directed 
Mutagenesis Kit (Stratagene) and ssDNA was prepared 
using fl helper phage as described by the supplier 
(Promega). Further details can be obtained on request. 

Reactions in vitro 

Oligonucleotides were S'-end-labelled with [y- 33 P] ATP 
(Perkin Elmer) using T4 polynucleotide kinase (NEB 
Inc.) or, in experiments to identify a S'-phosphotyrosine 
transposase-substrate intermediate, S'-end-labelled with 
[oc- 32 P] dATP Cordycepin (Perkin Elmer) using Terminal 
Transferase (NEB Inc.). Labelled oligonucleotides were 
purified on a G25 column (GE Healthcare). 

Double-stranded DNA was prepared by hybridization 
of complementary oligonucleotides. After 5-min denatur- 
ation at 95°C, the mixture was left to slowly cool to 25°C. 

S'-Labelled oligonucleotide (0.02 uM) and unlabelled 
oligonucleotide (0.5 uM) were incubated with 2 and 4 uM 
TnpA REP (45min, 37°C, final volume 10 pi) in 12.5 mM 
Tris (pH 7.5), 120 mM NaCl, 1 mM DTT, 20|ig/ml BSA, 
0.5 jig of poly-dldC and 7% glycerol in the presence or 
absence of 5mM MnCl 2 or MgCl 2 . Reactions were 
separated on an 8% native gel in TAE buffer, to detect 
retarded complexes, or on a 9% denaturing gel, to detect 
cleavage and recombination products, and analysed by 
phosphorimaging (Fuji). In reactions to detect covalent 
complex formation, S'-labelled substrates were incubated 
with TnpA REP in the reaction mixture as described earlier 
and reactions were separated on 16% SDS-PAGE gel. 

Cleavage sites were generally determined by comparing 
the band position in the sequencing gel with a sequence 
ladder. For certain small cleavage products, oligonucleo- 
tides of the presumed size and sequence were synthesized 
and used for comparison. 

Primer extension 

In vitro reaction mixtures were treated with Proteinase K, 
purified using the Promega PCR purification Kit and used 
as template for primer extension with S'-end-labelled 
'a' and 'b' primers with Taq polymerase (94°C 45 s, 
52°C45s, 72°C 1 min) 35 x. 

a: GTAAAACGACGGCCAGT. 

b: GCAGAACTGATCCGCTATGT. 



RESULTS 

REPs, BIMES and REPtrons: sequence, distribution and 
organization in E. coli 

Figure 1 shows the sequence organization and previously 
defined nomenclature of E. coli REP and BIME elements 
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Figure 1. Escherichia coli REPs and BIMEs. (A) Consensus sequences 
of E. coli y, zl and z2 REPs. The conserved tetranucleotide GTAG is 
boxed in violet, conserved positions are in red. Complementary se- 
quences (iREP) corresponding to each category are presented on the 
right. The CTAC tetranucleotide, complementary to the conserved 
GTAG sequence, is boxed in green. Two base mismatches in the 
hairpin stem are boxed. The red horizontal arrows indicate complemen- 
tary regions able to pair. Nucleotides in blue indicate bases that differ 
from one REP to another but nevertheless retain complementarity. (B) 
Structure of REP and iREP. Violet and green boxes represent the 
GTAG and CTAC, respectively. Black arrows indicate REP orienta- 
tion. Red indicates a y REP and blue a zl or z2 REP. Dark and light 
colours indicate REP and iREP, respectively. (C) Structures of 
BIME-1, and BIME-2. The reader is referred to Bachellier et al. (19). 
BIMEs are composed of a REP and an iREP separated by long (L) or 
short (S) linkers, H-H and T-T represent head-to-head and tail-to-tail 
configurations. 



(19): they are 30- to 40-nt long and could fold into an 
imperfect palindrome (IP) with a highly conserved 
tetranucleotide, GTAG, localized 5' to the IP foot 
(Figure 1A and B). There are three major types of 
E. coli REP sequence, y, zl and z2 (Figure 1A). Only 84 
REPs among 584 identified in E. coli K12 are single 
occurrences (19). Others are organized in pairs (BIME) 
including two REPs in inverse orientation separated by 
linker sequences (Figure IB and C): one, in the orientation 
including the 5' GTAG tetranucleotide, called REP, and a 
second inverted sequence called iREP (Figure 1). For 
functional reasons (see below), the sequence convention 
used here is inversed compared to Bachellier et al. (19). 
Escherichia coli BIMEs were classified into three families 
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(38). BIME-1 are composed of zl and y (Figure 1) and 
occur as single copies in which the REP and iREP are 
separated by a long linker (L). BIME-2 are composed of 
z2 and y. They occur as multiple tandem copies with the 
REP and iREP components separated by a short linker, S, 
and with one of three types of flanking sequence, s, 1 or r. 
So-called atypical BIMEs are chimeras of BIME-1 and 
BIME-2, carrying different combinations of y, zl, z2, L, 
S, s, 1 and r. Like BIME-2, they also occur in multiple 
copies. The BIME-1 L linker is well conserved and 
frequently carries an IHF binding site (6) while those of 
BIME-2 and atypical BIMEs vary both in length and 
sequence. This can be seen among BIME-2 and atypical 
BIME copies carried by the MG1655 genome 
(Supplementary Figure SI). BIMEs also vary in number 
from locus to locus in a single strain. However, the 
sequence of tandem BIME copies at any one locus is 
well conserved (Supplementary Figure S2). Moreover, in 
different E. coli strains, the number of tandem BIMEs at 
a given locus is also variable [Supplementary Figure S3, 
see also (39)]. 

Identification of tnpA REP among members of the 
E. coli/ Shigella group 

We (ISfinder: www-is. biotoul.fr) and others (24) have 
independently identified a group of proteins (TnpA REP ), 
closely related to IS200/IS605 transposases, associated 
with REP sequences. TnpA RE p occurs in a variety of 
bacterial species and genera and is always flanked by 
REP/BIME sequences. 

We analysed 110 E. coli and Shigella genomes available 
in the PATRIC database (40) for the presence of tnpA PBP 
and focused on its immediately surrounding region 
(Supplementary Figure S4 and 'Discussion' section). 
Two-thirds (74/110) carried tnpA PEP located at a unique 
position on the circular chromosome between yafL and 
fhiA, even in strains (ATCC8739) (CP000946), DH1 
(CP001637) and BL21 (AM946981) in which the entire 
region has undergone an inversion. 

Figure 2A shows the distribution of REP and BIME 
elements in the ^A REP -carrying region from selected 
E. coli strains. While the left (5') side invariably carried 
a single BIME, the right (3') included a variable number of 
REPs and BIMEs (e.g. MG1655, APEC01, EDla and 
0157:H7). In some strains, IS-mediated rearrangements 
had occurred (e.g. UMN026) (Figure 2A). Albeit more 
complex, these structures, called REPtrons, resemble 
members of the IS200/IS605 family of bacterial insertion 
sequences. In 7 strains lacking tnpA PEP , all belonging 
to the B2 clade of the E. coli/ Shigella group (41), the 
surrounding REP and BIME were still present but 
tnpA PEP had been precisely excised (e.g. CFT073). In 24 
strains, all except two belonging to the Bl clade (41), no 
trace of tnpA PBP or associated BIMEs was found but, 
instead, these had acquired the toxin-antitoxin genes 
hie A and hicB between yafL and fhiA (e.g. IAI1) 
(Figure 2A and Supplementary Figure S4). Mapping 
tnpA-REp on the E. coli j Shigella phylogenetic tree 
(Supplementary Figure S4) suggested that the gene was 
acquired early in the species radiation, at least at the 



E. fergusoni and E. coli j Shigella separation, and was lost 
later in some strains by these two distinguishable events: 
either by replacement (together with its flanking BIMEs) 
with hicA and hicB or by precise deletion while retaining 
the flanking BIMEs. 

We identified REP elements in 44 complete E. coli and 
Shigella genomes as well as in the available genome of 
Escherichia fergusonii. The estimated number of REPs in 
E. coli and Shigella genomes varied between 286 and 574 
with an average number of 422 ± 74. This large dispersion 
reflects the presence of two subgroups of genomes showing 
extreme REP frequencies: the first group with a higher 
REP frequency (on an average 546) corresponds to 
strains from a same subtree including clade A; the 
second group, composed of two E. coli strains (SMS-3-5 
and IAI39) from clade F and Shigella dysenteriae Sdl97, 
displays less than 310 REPs. The other genomes have an 
estimated number of REPs correlated to the genome 
size and centred around 395 ± 32. This group includes 
strains with and without tnpA PEP . From these results, 
it is difficult to determine whether tnpA PBP plays a 
role in REP amplification or maintenance or whether 
the loss of the gene is too recent to have had an ob- 
servable effect on REP copy number. However, as only 
216 REP elements have been identified in the 
E. fergusonii genome (which does not carry the 
REPtron), this distribution and the tree topology 
suggest that the large majority of REPs have arisen 
after the acquisition of the REPtron in the ancestor 
and before the divergence of E. coli/Shigella strains 
(Supplementary Figure S4). 

TnpA REP and Tnp Ajs2oo/605 form two different families 

Figure 2B shows an alignment of a representative group of 
10 TnpAi S 2oo/iS605 sequences (ISfinder) and a group of 
10 TnpA REP sequences from the public databases. All 
retain a conserved tyrosine (Y) and the HUH amino 
acid triad (histidine-hydrophobic residue-histidine) 
typical of the Yl transposase catalytic site. They also 
include a conserved asparagine (N), located four 
residues from the catalytic Y, replacing a glutamine (Q) 
residue of TnpAi S2 oo/iS605 involved in divalent metal ion 
coordination (27). However, TnpA REP group members are 
generally longer, include an additional C-terminal domain 
compared to TnpAi S2 oo/605 and exhibit specific conserved 
amino acid blocks throughout, in particular in the central 
and the C-terminal domains (24). MCL (Markov 
Clustering) analysis (35,36) of Tnp A1S200/605 and 
TnpA REP sequences also indicated that they represent 
two distinct Yl families ('Materials and Methods' 
section). Clearly, TnpA REP has sequence features suggest- 
ing it may be involved in catalysis of REP invasion 
and dispersal within genomes. 

TnpA REP activity in vitro 

To gain insight into the potential role of TnpA REP in REP 
proliferation, E. coli tnpA PBP was cloned with a His 6 
carboxy-terminal tail under control of the p ara promoter. 
The resulting protein, expressed from pBS176, was 
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Figure 2. Escherichia coli phylogenetic tree, REPtron structure and TnpA REP alignment. (A) E. coli REPtron distribution and organization: the left 
of the figure represents, for clarity, a simplified phylogenetic tree of the Escherichia coli / Shigella group obtained by pruning the tree shown in 
Supplementary Figure S4 which was retrieved from the PATRIC database [http://www.patricbrc.org/portal/portal/patric/Home; (44)]. The clades 
(A, Bl, B2, D, E, F and S) are from (41). The right of the figure shows examples of REPtrons from representative members of each clade. tnpA KBP is 
shown in grey, the flanking genes yafL and fhiA in green and in violet, respectively. Arrows represent the direction of transcription. Flanking BIMEs 
are shown with the same convention as in Figure 1. The hie A and hicB genes are also indicated as black and blue arrows, respectively. (B) Alignment 
of TnpAis2oo/iS605 an d TnpA REP . TnpA from IS200/IS605 family members are boxed in red while TnpA REP derivatives are boxed in green. 
Conserved positions are boxed as deep blue and less well-conserved residues in lighter shades of blue. The catalytic residues of TnpA IS605 and 
the potential catalytic residues of E. coli MG1655 TnpA REP are shown in red above and below the alignment: histidine (H), hydrophobic (U) and 
tyrosine (Y) glutamine Q) and asparagine (N). 
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Figure 3. Binding and Cleavage activity on substrates derived from 
REPtron. (A) The E. coli MG1655 REPtron and oligonucleotides 
representing ssBIMEs used in this analysis. The oligonucleotides used 
are indicated with numbers above and below the cartoon. They 
are summarized in Supplementary Table SI. (B) Binding activity 
observed by EMSA. 5'-end-labelled oligonucleotides were incubated 



purified on Ni-agarose resin ('Materials and Methods' 
section) and tested for binding and catalytic activities. 

DNA binding. Initial DNA substrates were based on 
REPs and BIMEs (BIME II) from the REPtron present 
in MG1655 (Figure 3A and Supplementary Figure S5). 
Various S^end-labelled single- or double-strand substrates 
carrying REP or BIME sequences were incubated with 
purified TnpA REP in the absence of a divalent metal 
ion and analysed by EMSA (Figure 3B). No retarded 
band was observed with double-stranded substrates 
(Supplementary Figure S6A). However, B268, an 
ssBIME-carrying substrate from the 5' REPtron end 
including a y REP and a z2 inverted REP sequence 
(iREP), showed a retarded band (Figure 3B, lanes 2 and 
3) as did a substrate with half of the iREP (z2) (B268i; 
Figure 3B, lanes 5 and 6). Removal of the entire REP (y) 
eliminated binding (B268b, lanes 8 and 9). Mutation of the 
GTAG to ACGA (B268ii, lanes 11 and 12) or removal of 
the mismatch in the REP (y), in which mutation GC to TT 
should allow formation of a perfect REP palindrome 
sequence eliminated detectable TnpA RE p binding 
(B268TT; lanes 14 and 15). Thus, both the GTAG 
tetranucleotide and the mismatch in the REP sequence 
are required for formation of robust TnpA REP -DNA 
complexes. 

Cleavage. To determine whether TnpA REP also has DNA 
cleavage activity, we incubated the S'-end-labelled oligo- 
nucleotides used for EMSA studies with TnpA REP in 
reaction buffer containing Mn 2+ or Mg 2+ ('Materials 
and Methods' section). The products were separated in a 
denaturating sequencing gel. DsDNA was refractory to 
cleavage. TnpA REP was active only on ssDNA substrates 
and the reaction (using S'-end-labelled substrates) 
generated a covalent DNA-protein intermediate as 
observed with TnpA Is ^ (Our unpublished data). 
Cleavage was generally more efficient with Mn 2+ than 
with Mg 2+ but no significant differences in cleavage spe- 
cificity were observed (Supplementary Figure S6B). 
Except where stated, all assays presented here were per- 
formed with ssDNA substrates in the presence of Mn 2+ . 

B268, a 116-nt BIME-carrying substrate underwent two 
major cleavages at the 3' z2 iREP to generate two labelled 
fragments of 85 and 55 nt (Figure 3C, lanes 1 and 2) 
whereas B268i, a 61-nt oligonucleotide carrying the 
entire y REP sequence but only part of the z2 iREP, 
shares the first cleavage site with B268, giving a 51-nt 
product (lanes 3 and 4). Even though the other B268 
BIME derivatives showed no significant binding in 

in the binding buffer ('Materials and Methods' section) in the 
absence or presence of 2 or 4fiM TnpA REP (shown as a triangle 
above the gels). '— ' indicates no added TnpA REP . The yellow box 
represents a GTAG tetranucleotide mutated to ACGA. (C) Cleavage 
of ssBIME derivatives. Arrow heads show the cleavage products. 
S'-end-labelled oligonucleotides in the absence or presence of 4|iM 
TnpAREp : B268 (116nt), lanes 1-2; B268b (52 nt), lanes 3-4; B268i 
(61 nt), lanes 5-6; B268ii (61 nt, mutated for GTAG), lanes 7-8; 
B268TT (61 nt), lanes 9-10, respectively. (D) Structure and activity of 
other substrates derived from REPtron. In the right, binding and 
cleavage activity are summarized, (b = binding; cl = cleavage). Small 
black arrows indicate the position of cleavage sites. 
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EMSA (Figure 3B), their activity was also examined since, 
in the case of TnpA Is ^, the fact that some DNA sub- 
strates do not form stable complexes visible by EMSA did 
not necessarily eliminate their capacity to undergo 
cleavage (18). B268b, a 52-nt substrate with only the z2 
iREP was refractory to cleavage (lanes 5 and 6) as 
was B268ii, the 61-nt partial BIME carrying a mutated 
GTAG (Figure 3C, lanes 7 and 8). We confirmed this 
using additional GTAG mutants derived from B268 or 
other substrates. B268TT with a GC to TT mutation 
which allows formation of a perfect REP palindrome 
(Figure 3B, lanes 9 and 10) was also refractory to 
cleavage. We obtained a similar result with B268GC 
(mutation of AA to GC, not shown). 

B269, an oligonucleotide complementary to B268 
including a z2 REP and a y iREP sequence underwent 
two cleavages at the 3' iREP (Figure 3D). B269a, an oligo- 
nucleotide derived from B269 carrying only part of the 
y iREP was also cleaved (Figure 3D). At first sight, this 
appears to contrast to the behaviour of the IS200/IS605 
family, where only the top strand is active (21,26). 
However, this is clearly due to different substrate 
configuration: the two component REP sequences in a 
BIME are inverted with respect to each other. Thus, 
each DNA strand carries a REP and an iREP, permitting 
cleavage on both strands. 

We also examined the cleavage properties of the BIMEs 
located immediately 3' of tnpA KBF The results were 
similar to those obtained with the 5' BIME. Cleavage 
occurred 5' to the y REP on the top strand and to the 
z2 REP on the bottom strand (Figure 3D). Additional 
BIME variants from other E. coli chromosome regions 
showed similar behaviour (Supplementary Figure S5). 
TnpA REP catalysed cleavage of all three iREP variants, 
y, zl or z2, and sometimes also cleaved sites within the 
linker sequence (Figure 3D, B270 and B270c) but only 
when the substrate also included a REP. 

Thus, the data demonstrate that a REP structure is 
indispensable for BIME cleavage, presumably by 
providing a binding site for TnpA REP . Moreover, they 
show that TnpA REP recognises the REP with its 5' 
conserved GTAG tetranucleotide and requires the 
non-complementary base(s) in the stem for binding and 
for activity but cleaves at the inverted sequence of the 5' 
or y REP (iREP) and at the linker sequence with 
the expected polarity (each cleavage resulting in a 5' 
phosphotyrosine intermediate; unpublished data). The 
results also demonstrate that cleavage can be either 5' or 
y to the essential REP sequence. 

Defining the cleavage sites. Although REP and iREP 
portions of BIMEs are relatively well conserved, the 
linkers are highly variable (Supplementary Figure SI). 
Like REPtron-derived substrates (Figure 3D), several of 
these proved to contain cleavage sites in the linker region 
although the efficiency of cleavage at these sites is variable. 
Examples are B315 a partial BIME located at araD-A in 
MG1655 (one site, Figure 4A) and B319, located at 
gltP-yjcO (four sites, Figure 4A). 

In summary, we observed two categories of cleavage 
site: those on either side of iREP and those in the linker 



regions. We refer to the first category as 'iREP' and to the 
second as 'linker' cleavage sites. The pattern of cleavage 
was similar in the presence of either Mn 2+ or Mg + 
(Supplementary Figure S6). 

Cleavage sites from each category obtained from a 
set of naturally occurring BIME sequences are aligned in 
Figure 4B. 'iREP' sites (Figure 4B left) are situated at 
both sides of the iREP palindrome in relatively 
conserved regions: the first type (I) occurred at 
T_4G_3C_2C_i l T 1 G2A 3 T 4 /A4 (where ''' represents the 
site of cleavage) and the second (II) at G_ 4 /T_ 4 G_ 3 / 
T_ 3 /A_3C_ 2 C_ 1 I T 1 A 2 C3A4/C 4 /G4, within CTAC, the 
complement of the conserved GTAG tetranucleotide. 
Note that type I sites are present in zl and z2 iREPs but 
are absent in y iREP sequences. 

The 'linker' sites (Figure 4B right) appear less 
conserved. The deduced consensus reveals conserved 
CC'T (coordinates C_ 2 C_ 1 I T 1 ) for 'iREP' and C'T 
(C_i I T 1 ) sequences for 'linker' sites. 

The importance of the C 1 T for cleavage. To understand the 
rules governing BIME processing in more detail, we 
analysed the cleavage pattern of a panel of mutant sites 
introduced at the iREPI site in the 5' partial BIME (B268i) 
from the MG1655 REPtron (Figure 4C) and at the iREPII 
and linker sites in a second partial BIME (B315) 
(Figure 4D and E) present at the araD-A intergenic 
region. Mutations were introduced either in a block or 
individually in each site. Although this is not an 
exhaustive analysis, the results obtained show that the 
central C'T sequence was indispensable for cleavage as 
all substitutions at these positions prevented cleavage 
while mutations in some other positions were tolerated. 

Although we have not systematically measured the 
efficiency of cleavage at each site, sequences flanking the 
CT dinucleotide may influence cleavage efficiency. For 
example, the weak site B319(l) includes GACC*TACA 
compared to the stronger B315(l) with GGCC*TACA: 
G_ 3 may therefore influence cleavage efficiency 
(Figure 4A and B). In addition, mutation of T_ 4 to any 
base (B268i, iREPI site, Figure 4C) reduced cleavage in 
B268ib, B268ic and B268id (Supplementary Figure S6). 

Thus, 'iREP' and 'linker' sites appear to share similar 
sequence requirements indicating relatively limited 
cleavage specificity. 

Cleavage of single strand circular DNA. To confirm the 
requirement of C'T for cleavage and to assess the 
distance over which TnpA REP might act, we examined 
cleavage of a significantly longer DNA substrate, a 
single strand DNA circle derived from the 4.1-kb bacterio- 
phage f 1 -derived phasmid, pBluescript II SK, into which a 
BIME had been cloned (pBS180, 'Materials and Methods' 
section). Following cleavage with TnpA REP , DNA was 
deproteinized and the cleavage sites were mapped by 
primer extension using two different primers ('Materials 
and Methods' section). The results are presented in 
Figure 5. In addition to bands resulting from two 
natural polymerase stalling sites which depend on the 
presence of a BIME (lanes 1 and 5), the substrate 
contains many cleavage sites stretching both upstream 
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Figure 4. iREP and linker cleavage sites. (A) Examples of linker cleavage sites on B315 and B319. B315 and B319 are partial BIMEs derived from 
MG1655 araD-A and gltP-yjcO regions. Linker cleavage sites are indicated by blue small arrows and numbered, 'asterisks' indicates the position of 
labelling. (B) Alignment of observed in vitro 'iREP' and 'linker' sites. 'iREP' and 'linker' sites are indicated by purple or blue arrows. Their sequence 
is shown with the consensus sequence (cons) for each category below in bold. iREPI and iREPII sites occurred on both sides of the iREP 
palindrome, the second overlapped the CTAC tetranucleotide (in green), the complement of the conserved GTAG tetranucleotide. The oligonucleo- 
tides used are shown to the right of the sequences. (C) Characterization of 'iREPI' (I) site using B268i as an example. Wild-type bases to the left of 
the cleavage site are shown in purple. Those to the right are shown in black. Mutated bases are shown in red. '+' indicates cleavage, '— ' indicates no 
cleavage. Nucleotide designations are shown to the right. (D and E) Characterization of 'iREPII' (II) and 'linker' (L) using B315. Colour codes are 
the same as for (C). [For the B315, II and L correspond to sites B3 1 5(1) and B315(2) in Figure 4A]. For the B315 linker site, nucleotides to the left 
of the cleavage site are shown in black and those to the right in blue. Mutated nucleotides are shown in red. 



and downstream of the resident BIME over the entire 
region (>400nt) analysed (lanes 6 and 8). Moreover, 
cleavage is absolutely dependent on the presence of the 
functional BIME since no products are observed with a 
substrate carrying a BIME mutated for the conserved 
GTAG (compare lanes 2 and 4). Mapping these sites on 
the DNA sequence indicated that they occurred at a C'T 
dinucleotide. 



Thus, BIME-directed cleavage can occur at a consider- 
able distance from the TnpA RE p binding site. 

Recombination. A striking characteristic of BIME-2 and 
atypical BIMEs is the variation in the linker sequence and 
in copy number at a given locus in different strains 
suggesting that they may undergo recombination and 
amplification. To investigate this in vitro, we examined 
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Figure 5. Cleavage of circular ssDNA. The substrate and primers used 
for analysis are shown. ssDNA circles derived from pBS180 (substrate 
with functional REP, lanes 1-2; 5-8) and pBS180mut (substrate with 
REP mutated for GTAG, lanes 3-4) were incubated in the absence or 
presence of TnpA REP in buffer containing MnCl 2 and used for 
primer extension with 5 ; -end labelled oligonucleotide 'a' (lanes 1-6) 
and 'b' (7-8). Lanes 1-2 and 5-6 correspond to the same samples 
separated under two different migration conditions on a 6% sequencing 
gel. Distances in nucleotides show the distance of the complementary 
oligonucleotide primer from the foot of the functional REP. Cleavages 
revealed by primers 'a' and 'b' are shown by red and black arrowheads, 
respectively. 



the capacity of TnpA REP to promote BIME recombin- 
ation with two sets of substrates. The first included 
ssBIMEs from the REPtron region (B268, B268i and 
B268ii; Figures 3 and 6A). In these experiments, we used 
a 20-fold molar excess of the unlabelled partner oligo- 
nucleotide to facilitate recombination. When 5' labelled 
116nt B268 was incubated with TnpA REP , it was cleaved 
at two 'iREP' sites generating 85 and 55 nt products 
(Figure 6 A, lanes 1 and 2). Addition of unlabelled 61 nt 
B268i generated a 65-nt recombination product (lane 3). 
This species no longer appeared in the reaction with 
the inactive substrate B268ii mutated for GTAG 
(Figure 6 A, lane 4). These results demonstrate an 
exchange of sequences between the partner DNA 
molecules demonstrating recombination between two 
'iREP' sites. 

The second substrate set included derivatives of an 
ssBIME from the araD-A region (B316, Figure 6B). 
S'-end-labelled 88 nt B316 was cleaved at a 'iREP IF and 



a linker site, generating 76 and 69 nt products (Figure 6B, 
lanes 2-5). In the presence of unlabelled B316, a product 
of 95 nt resulting from recombination between the 'iREP' 
and the 'linker' sites was generated (Figure 6B, lane 2). 
We no longer observed this species on addition of 
unlabelled B316a, mutated for GTAG or of unlabelled 
B316b, mutated at the 'iREPIF site, respectively 
(Supplementary Table SI; Figure 6B, lanes 3 and 4). The 
recombination product was still observed with cold B316c, 
mutated at the 'linker' site but keeping the 'iREP' site 
intact (Figure 6B, lane 5). However, this recombination 
product was not generated when we used S'-end-labelled 
B316c in the presence of unlabelled B316c (Figure 6B, 
lanes 6 and 7), confirming the nature of this reaction. 
We did not observe recombination products from 
certain reactions since they reconstitute a fragment 
having the length of the original labelled substrate 
(e.g. Figure 6B). We also used the S'-end-labelled 88 nt 
B316 and oligonucleotide B315 (Figure 6C). In vitro, in 
addition to the two cleavage products of B316, these sub- 
strates generated a series of DNA species which migrated 
high in the gel and had sizes consistent with recombination 
products which include an additional REP together 
with various combinations of linker sequences. 

These results demonstrate recombination between iREP 
and linker sites which might result in BIME variability, 
multiplication and amplification. 



DISCUSSION 

tnpA KBF , coding for a protein related to the Yl 
transposases, was identified in association with REP/ 
BIME sequences in structures called REPtrons found in 
a number of bacterial genomes Here we compared the 
REPtron structure and REP/BIME distribution in avail- 
able E. coli and Shigella genomes. We also analysed E. coli 
K12 TnpA REP activity including cleavage and 
recombination in vitro. While TnpA REP shared the same 
general organization and similar catalytic characteristics 
as the TnpA IS 200/is605 Yl transposases, it exhibited 
distinct properties potentially important in creation of 
BIME variability and amplification. The presumed 
importance of tnpA KBF in REP/BIME evolution and 
dispersion within genomes and multiple roles assumed 
by REPs and BIMEs themselves in cell physiology could 
be interpreted as domestication. Although many cases of 
domestication of eukaryotic transposable elements have 
been documented (42,43), such domestication has not 
yet been described for classical bacterial elements. 

REPtron evolution in E. coli and Shigella 

The tnpA-REp gene was present in 74 of the 1 10 E. coli and 
Shigella genomes available in the PATRIC database 
(http://www.patricbrc.org/portal/portal/patric/Home); 
(40,44) always as a single copy at the same genetic locus. 
This genetic conservation implies that these sequences had 
been acquired in a single event early in the last common 
ancestor of the species which gave rise to present-day 
strains. The phylogenetic analysis of E. coli/ Shigella 
strains lacking tnpA KBF indicates that tnpA KBF had been 
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present but had subsequently been replaced or deleted 
(Supplementary Figure S4). 

Although REPtron organization resembles that of 
members of the IS200/IS605 family, we do not believe 
that it represents a true transposable genetic element. Its 
unique genetic location indicates that it has not undergone 
subsequent rounds of transposition. This suggests that if 



the spread of REPs within a genome is catalysed by 
TnpA REP it must occur by mobilization in trans. 

We observed only minor differences in REPs/BIMEs 
copy number and distribution in strains with or without 
tnpA KEP (Supplementary Figure S4), in agreement with 
the idea of REPtron/ tnpA^p loss occurring late in the 
radiation of these strains. This is in contrast to the 
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(continued) 



Nucleic Acids Research, 2012, Vol. 40, No. 8 3607 



previous observation in some other bacterial species of a 
strong correlation between the presence of tnpA KBF and 
the increased number of REPs/BIMEs in corresponding 
genomes (24), and may reflect a more recent REPtron/ 
tnpA KEP acquisition event in these strains. 

The REPtron distribution therefore clearly argues for 
an origin which predates the radiation of E. coli/ Shigella 
and, since E. alberti appears to carry a similar (but not 
identical) REPtron, it may predate the separation of this 
species. 

TnpA REP binding, cleavage and strand transfer activity 
in vitro 

To evaluate whether TnpA RE p might be involved in the 
recombination events leading to REP invasion and spread, 
we examined its activities in vitro. We demonstrated that it 
binds ssREP sequences and requires the conserved 5' 
GATG tetranucleotide for this. It also catalyses BIME 
cleavage both upstream and downstream of the REP 
sequence. The data suggest that the functional unit on 
which the enzyme acts is a (complete or partial) BIME 
rather than an individual REP or iREP. Cleavage 
requires an entire REP sequence and occurs at the iREP 
and in BIME linkers. Since BIMEs are composed of two 
inverted REP copies, this implies that TnpA REP can cleave 
both DNA strands, contrary to TnpA Is ^ /ISjC)ra 2 which 
functions in a strand specific manner. Moreover, the two 
base pair mismatch located in the REP stem is also essen- 
tial for binding and activity. 

Although TnpA RE p forms a distinct clade within 
the Yl transposase family, it appears to share with 
TnpA IS 60s the same absolute requirement for ss substrates 
(Supplementary Figure S6) and the formation of a 
covalent protein-DNA intermediate (25) (unpublished 
data). However, unlike TmpAi S2 oo/iS605, the sequence spe- 
cificity for cleavage appears relatively low, requiring 
only the dinucleotide CT while tolerating substitutions 
in other surrounding positions. This limited cleavage spe- 
cificity may be responsible for BIME diversification. 
Cleavage occurs in the presence of Mg 2+ but is much 
more pronounced with Mn 2+ . However, similar cleavage 
patterns are observed with both metal ions 
(Supplementary Figure S6). 

We also observed strand transfer activity in vitro. 
Strand transfer can therefore create sequence variability 
and assemble tandem BIME copies resembling the tandem 
amplification observed in vivo (see below). 

Model of BIME variability and amplification 

In light of the observed distribution and sequence variabil- 
ity in the collection of E. coli strains, BIME colonization 



and expansion throughout genomes may occur as a 
two-step process involving diversification followed by 
amplification (Figure 7). There are several ways in which 
this might be accomplished. 

Diversity could be generated in an excision and 
insertion process by using different cleavage sites for 
excision and for insertion as shown in Figure 7A. From 
a BIME array carrying several 'linker' cleavage sites 
[Figure 7A(i)], cleavages at two distinct sites ('iREP' or 
'linker' types) and joining between Tyr-5'P generated 
from the first and the y~OW from the second would lead 
to excision of a ssBIME circle [Figure 7A(ii,iii)] in the 
same way as been shown for IS608 (25). Clearly, like 
IS608 and !SDra2, this could involve ssDNA on the 
lagging strand template of a replication fork but might 
also use ssDNA generated during R-loop formation, 
triggered by transcription-induced negative supercoiling 
(45), repair or by supercoil-driven extrusion of the REP 
secondary structure element (46). If processed at different 
cleavage sites [Figure 7A(iv)] before integration, this 
intermediate could give rise to several BIME variants 
[Figure 7A(v)]. For BIMEs, this process can occur on 
both DNA strands which in principle could carry different 
'linker' cleavage sites thus increasing the potential BIME 
sequence diversity. Degradation of S'-OH end by host 
nucleases might also contribute to BIME variation. In 
this model, variation would be coupled to integration. 

Amplification could occur following insertion of the 
excised BIME into a suitable target as that shown in 
Figure 7B. This uses a 'rolling circle' like recombination 
mechanism on an inserted BIME (either in the head to 
head or tail to tail orientation) which does not necessarily 
involve rolling circle replication of an excised circular 
product (see below). Using an example of an H-H 
BIME, an initial cleavage at an 'iREP' site [Figure 7B(i)] 
would generate a y~OH which could act as a primer and 
be extended by host DNA polymerases [Figure 7B(ii)]. 
A second cleavage occurring at a 'linker' site 3' of 
the REP on the newly synthesized DNA strand 
[Figure 7B(iii)] would liberate another 3 / -OH that 
attacks the first Tyr-5'P complex [Figure 7B(iv)]. This 
would lead to addition of a supplementary BIME unit 
[Figure 7B(v,vi)]. 

Alternatively, amplification could take place by a 
mechanism involving rolling circle replication from two 
BIMEs in tandem such as that proposed previously 
[(24); Figure 7C]. Although this model is attractive and 
would explain the amplification process, it requires four 
TnpA RE p-directed cleavages and might appear complex. 
However, the model implies cleavages on BIMEs in both 
strands [Figure 7C(ii,iii)], might therefore explain the 
necessity to maintain BIME as unit. 



Figure 7. Continued 

model: The first cleavage at 'iREP' site [B(i)] would generate a 3 / -OH which could act as a primer and be extended by host DNA polymerases [B(ii)]. 
A second cleavage occurring at a 'linker' site 3' of the REP on the newly synthesized DNA strand [B(iii)] liberates another 3'-OH (in red) that attacks 
the first Tyr-5'P complex [B(iv), in grey]. This leads to addition of a supplementary BIME unit [B(v,vi)]. (C) Alternative model of BIME amplifi- 
cation [adapted from (24)]: From two BIMEs in tandem, two cleavages on the bottom strand, followed by reciprocal strand exchange [C(ii)] would 
lead to excision of the bottom central BIME. A 3'-OH resulting from a third cleavage on the top strand [C(iii)] would be used as primer for a 'rolling 
replication amplification' of the excised circular BIME. A fourth cleavage on the newly synthesized DNA strand [C(iv)] liberates another 3'-OH (in 
blue) that attacks the third Tyr-5'P complex (in pink). This leads to addition of one (or numerous) supplementary BIME(s) [C(vi)]. 
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Note that the recombination we observe in vitro 
between two 'iREP' sites (Figure 6A) and between a 
'iREP' and a 'linker' site (Figure 6B), is equivalent to 
steps Aii and Biv. We believe that integration step (Aiv) 
corresponds to a 'recombination' of excised BIME with a 
target carrying a REP/BIME. The ability of TnpA REP to 
cleave at a significant distance upstream or downstream 
of a resident BIME (Figure 5) would indeed enable 
a BIME insertion/recombination at a distance from an 
existent BIME, therefore disseminating them on the 
chromosome. This capacity would also allow the unit to 
sequester additional flanking sequences including entire 
genes or gene fragments as have been observed for 
codA-cynR from E. coli and tnpA KBP from Pseudomonas 
putida, Pseudomonas fluorescens, Stenotrophomonas 
maltophilia, Mannheimia succinicproducens (24) 
(Supplementary Figure S7). Acquisition of neighbouring 
genes is also characteristic of rolling circle (RC) 
transposition of the IS97, ISCR and Helitron elements 
(47,48) 

While these data underline the potential plasticity 
conferred on the host genome by the ^t^4 rep /BIME 
system, neither the exact mechanism nor the inherent 
frequency of TnpA REP -mediated BIME recombination 
are at present known. Further bioinformatic approaches 
are expected to reveal more details concerning BIME 
spread through genomes. Moreover, mechanistic studies 
would be greatly aided by knowledge of the TnpA REP 
structure with and without its DNA substrate. 
Additionally, it will be essential in the future to 
develop an in vivo system to observe the activity of 
the tnpA KBF /BlME system within its host genome since 
it is possible that TnpA REP activity requires host 
proteins and may be coupled to cell physiology such 
as replication, transcription or supercoiling. Such 
studies are underway. 

This class of enzyme is widespread. It includes 
transposases of the IS200/IS605 and IS97/ISCR families 
of insertion sequences, TnpA REP , relaxases of conjugative 
plasmids and proteins involved in the replication of rolling 
circle plasmids, phage and eukaryotic viruses. These 
studies raise important questions concerning the evolu- 
tionary relationship between transposable elements and 
their domestication in cell function. 
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